Business Intelligence: A Guide for Small to Medium Sized Businesses

Analytics

Introduction to Business Intelligence

Having access to data insights is vital to staying competitive in the modern age. However, navigating the sea of new vocabulary, service providers, software and techniques is a problem that many SME's face. Not to fear though, we have curated this go-to guide to walk you though the key topics and what they mean for businesses like yours.

The starting point is to understand the key concepts that make up business intelligence:

  • Data Management
  • Data Science
  • Analytics
  • Machine Learning/AI

Data Management

This is as the name suggests. It's everything data and covers areas such as: Data Collection, Storage, Transformations, Formatting and Warehousing etc. The importance of good data management cannot be overlooked, if you get this stage wrong then you will not get value from analysis. So where do you start? Well, chances are your business already collects data. This could be website traffic or sales data and often this data is gathered by software and apps, such as Quickbooks or Google Analytics. All you have to do is learn to store this data in the right place.

Data Science

Firstly, Data Science is not a scary as it sounds. If you’re a Small to Medium Sized business that is starting out, keeping it simple is your best option. Trying to use complex equations that you're not familiar with will just leave you perplexed. So, what is Data Science? Well, it's the art (and we mean art) of deriving human comprehendible insights from data. We use the term art frequently as there is no black and white answer and what you're actually trying to achieve is a deeper level of understanding, not a definitive answer.

Analytics

This is the metrics, dashboards and visuals you need for day to day operations. Answering questions such as “how many website visitors do we have?” or “what’s our average sale value?”. Without answers to these key questions you’re taking a stab in the dark when it comes to decision making. The most important thing to remember with analytics is that they need to be up to date and holistic. There’s no point leaving out metrics for the sake of your own vanity...

Machine Learning / Artificial Intelligence

AI is as useful as it is confusing, it’s developing so quickly that trying to understand what it is and what it can do is a constant battle. The good news is that a large part of new developments occur in academia. Despite how interesting they are, they don’t bare much thought from a business perspective. The main thing to understand about AI is that it learns to reduce its errors, not to increase its accuracy. That might sound like an oxymoron, but it isn't. What I'm saying by this is that no, AI isn’t perfect (neither are humans but let’s not get philosophical). What happens is that the AI’s creator deems it good enough to use. For businesses, this means that you need to take into account that if you are going to rely upon an AI solution, it will be wrong sometimes... maybe quite a lot. Most providers don’t publish any details about their models (and most models get worse in production so test results can be misleading) and because of this lack of transparency you need to be vigilant when it comes to their implementation.



Don’t have time to read now? Download this page as a PDF



Data Management

There are four basic processes in data management: collect, enrich, transform and load.

Collect: The stage where data is collected. This could be by a machine like a computer or a person. Keep in mind the way in which data is collected can have a profound impact on the end result of analysis.

Enrich: Enhancing data with additional data. This could be converting a currency or translating text.

Transform: Formatting the data in a sensible and scalable way. This could be changing date formats to be uniform or transposing data to avoid poor performance tables in your database.

Load: Putting data where it can be managed and accessed by the right people without interrupting workflows. This could be a file, database, data warehouse or cloud storage bucket.

To start any data project it’s important to have a clear understanding of the questions you are looking to answer. To do this you need to break down your objectives into parts, then consider what simple questions when answered will give you the context you need to make a decision. For example, if your organisation was looking to increase productivity then: Where is the majority of time spent? Where is the most value created? or, What tasks are repetitive? would all be initial questions to answer to achieve your aim.

Once you have these questions you can consider what data you need, whether that be CRM data, sales data or a public data source like weather patterns. Consider the finer details too. Does this tool or process cause bias? Who created this dataset? Does it miss some data? Knowing what you need before looking for sources will save time when looking for solutions. This process will give you a checklist of requirements to look for, rather than trying to do ad hoc collection when you’re missing some data.

More on Data Collection…

Once you're collecting data you can look into enriching that data to derive more value in analysis. It makes sense to do tasks such as converting currencies, adding geographic data to postcodes or even running it through machine learning models to add sentiment to text or label data automatically.

Doing this now will keep model sizes smaller and performance up when creating reports and dashboards using the data later. There are lots of tools you can use to enrich data but I would focus on understanding API’s and some simple SQL. API’s (Automated Programming Interfaces) allows users to make a request to a service over the web and get a specified output back. You will mainly use two types of request 'GET' and 'POST'. A GET command simply returns a piece of information. A POST will carry inputs that will potentially add data to a database and can also return a conditional response based the input parameters.

More on Enrichment…

The next stage of the process is applying transformations. This is usually tasks such as changing to uniform dates (trust me, dates often cause issues. Many softwares use American date format which can cause nightmares when modelling with UK format). Ranking data, filtering out needed data, transposing some data. I would try and keep two things in mind here... narrow tables and sensible identifiers/keys.

Narrow tables (tables with few columns) will allow you to scale data projects without large drops in performance, same with identifiers. If you're storing a large string like 'ACB-123-XYZ-QRF-999-HAT' for every row in a table then you are going to waste lots of memory and ruin performance. Best practices is short numerical identifiers.

Finally it’s time to load your data. Most likely this data will be loaded into a database or a data warehouse. This will depend on a number of factors but the main one is the size of data.

For a data warehouse to be good value for money you need to be storing not just lots of data but also a wide range of data. Otherwise I would suggest to opt for a SQL database as this will give good flexibility at reasonable cost, but also easily migrates to a data warehouse as your business grows. SQL isn’t the only player in the market though. Mongo DB, GraphQL and Big Query etc all have great performance and depending on the task, could perform better than SQL. The easiest way to pick one is choosing whatever is easiest to understand. Sometime this decision is made for you. For example, many web application frameworks use MySQL as standard so instead of confusing things, keep it simple and use that elsewhere. This applies mainly to small business as budget is always key and sticking to one solution will mean paying less for support/consulting etc.


Intro to Data Science

Data validity is all about how reproducible results are, e.g. if you run an analysis on a sample of 1000 people, how likely is the result to be the same with a different 100 people? If your sample is bias then your results may not be accurate when compared to the general public. Data validity can be compromised by many things, such as: bias samples, bias methods, missing data or time sensitive data. Sample bias occurs when samples are too simple, not diverse or not representative. A great example is the fact that most Sociology and Phycology studies are conducted using samples of WEIRD (Western, Educated, Industrialised, Rich, and Democratic) people. This means that many theories may not hold up when applied to non-'WEIRD' people. Bias methods are common. A classic example is a survey of internet usage that is conducted via an internet survey. Clearly people with little to no internet usage have no way of participating, therefore skewing the results towards higher usage. Missing data - This involves some more conceptual thinking and you need to consider what data isn’t present. For example, if you're a small business running search ads with a small budget, always consider when analysing results that your data only represents a small sample of the potential target market and thus, the results may not be the same if you had this missing data. Time sensitive data is pretty self explanatory and can include concepts such as seasonal trends, or 'does this correlation weaken as more time passes from a certain event'.. Strong Positive Correlation | Weak Negative Correlation What does this mean for your business? What are the costs involved and what to think about when recruiting a Data Scientist? Data Science is quickly turning into a vital tool for all businesses and without it you risk being left behind by the competition. Good data science isn’t cheap but done well it achieves great return on investment. When thinking about implementation, consider your goals and your budget. Often for small business the cost to need ratio does not warrant an in-house team (expect to pay at least £70K in the first year to get setup). So a good solution is outsourcing. This will give you access to the analysis you need without constant overheads and recruitment headaches. If you're building an in-house team, consider the fact that'Data Science' is a wide ranging topic that gets more and more muddled each year with more people calling themselves by the term when in reality they fit more with an analyst role. A general rule of conducting some form of test project should definitely be part of your hiring process. You want someone that can walk the walk, not talk the talk.


Creating Metrics and Visuals for Analytics

Analytics is most likely what you have in mind when you think about business intelligence. Good analytics will give you all the information needed to manage the day to day running of your business. Bad analytics will mislead you into a false sense of certainty around business activity that isn’t so. As a Small to Medium business, how do you ensure you aren’t being misled and you are focusing on the right metrics? Metrics at a high level fall into two categories, Vanity or Sanity. Vanity metrics such as ‘total downloads’ tend to sound great but don’t tell you anything. Sanity Metrics such as ‘Active Users’ give you much more insight.

Off the shelf analytics (like those included in many apps and softwares) often fall into the 'Vanity metric' category. It’s not in the providers best interest to show you areas of poor performance. They tend to come from the perspective of making something sound good. This is why creating bespoke analytics will give you far more value that those off the shelf tools. Read more on Metrics…

The starting point for any project is picking a BI solution. Some of the most popular are Power BI, Tableau, Looker, Qlik and Data Studio. This will be how you calculate metrics, create graphs and visuals and distribute these to your team in dashboards and reports. The cost of the products vary quite substantially and each have their own benefits and drawbacks. Personally I'm a huge fan of Power BI as I find it has the best ability to add custom Python code to both datasets and visuals. I also wouldn't recommend Data Studio as, from my experience it doesn’t cope well with large datasets but it is free, so how much can you expect.

Read more on BI Solutions…

Visualisations are vital tool for analysis. They will help you understand trends, spot issues and generally interpret large amounts of data quickly. There are a few things to consider though. Let's begin with scales. Often charts will automatically set the scale for data provided, which is great in some respects and terrible in others. Mainly comparing data, auto changing scales means that certain trends/non-trends can easily be misleading, looking far bigger or smaller than they actually are. I like fixed scales because I know exactly what I'm looking at and can easily compare data. The other thing to consider is what visual is going to be best for your data.

Some helpful tips are:

  • Line charts are good for spotting trends in time series data but become very difficult to interpret when there are many categories
  • Pie charts should always be bar charts (Pies give you no way of accurately comparing values)
  • Averages should be accompanied by either distributions, standard deviations or variance as on their own, averages can be very misleading (find out why here).

Leveraging Artificial Intelligence / Machine Learning

First off let me clarify a couple of things. Machine Learning and Artificial Intelligence are the same thing. Generally, programmers refer to 'Machine learning' and marketers use the term 'AI/Artificial Intelligence'. Which leads me onto my next point. The AI industry is full of fancy marketing that doesn't really give an accurate representation of the solutions. Many solutions make claims that, when investigated either don’t make sense or simply can’t work. Conceptually, many products either don’t work or are easily broken. Generally speaking, AI does not perform well at predicting social outcomes which is why many HR/Recruitment AI products are simply fanciful. Don’t take our word for it, it’s well documented, take this Princeton Professor's presentation for example.

So what is good AI and how do you leverage it for your business? Generally AI is good for a number of tasks such as:

  • Predicting numerical values
  • Grouping data
  • Labelling data
  • Recommendation engines

There are a number of ways you can utilise these for your business. Such as using sentiment analysis on custom interactions on social media to quickly identify people having bad experiences and flag this to customer service teams.

Another interesting use includes looking for out of the ordinary network uses for cyber security or you can even use it to predict values such as property prices!

Costs for Machine Learning are very variable. Off the shelf tools such as sentiment analysis are low cost, therefore represent good value. However, creating your own machine learning models can be an expensive journey. Usually when we are developing a model we start of simple (as can be), and where possible use pre trained models and a simple model to get initial results and see if the idea is worth further investment. This data also acts as a good baseline for comparison if you do decide to go down the completely custom route.


Related resources you might like


Inside the blue box - The data that runs Breakfast at Tiffanys

General

Read More

Anomaly Detection within Big Data

General

Read More

Utilisation of Data Science - SME’s VS Large Corporations.

General

Read More